home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Interactive Reference Guide
/
C-C++ Interactive Reference Guide.iso
/
c_ref
/
csource5
/
357_01
/
cstar1.exe
/
DESIGN.DOC
< prev
next >
Wrap
Text File
|
1991-06-20
|
17KB
|
426 lines
DESIGN.DOC: Design issues pertaining to the CSTAR translator
June 25, 1991
The following notes were written in 1987, I believe. The subject of these
notes are various issues concerning the design of the CSTAR translator, and
in some cases the CSTAR language. Be warned: these notes are extremely
technical in nature and assume a deep level of knowledge of the CSTAR
translator. They are likely to be of interest only to *serious* students of
CSTAR.
Parts of this file have the flavor of cryptic "notes to myself." If you have a
question, please don't be intimidated by the obscure nature of these notes.
Just give me a call and I'll be happy to tell you what I know. By the way,
much of this file was written by a brilliant programmer named Paul Schick,
who worked on CSTAR for about a year with me. Parts of this file are in
dialog form, with my comments being marked by EKR:
Another way of looking at this file is as pages from an engineering
notebook. Some of the issues discusses here never really got satisfactorily
resolved. The section called goals, principles and problems was written by
me, and the rest of the file is really a discussion of the problems in meeting
those goals and principles in any kind of clean way.
Some terminology: LHS stands for the Left Hand Side of an assignment
statement and RHS stands for the Right Hand Side. A "star expression" is
an expression containing the * operator. A "loc node" is the part of the
parse tree containing all information for a "location," which usually
corresponds to some identifier. Understanding how loc nodes are created
and used is a key to understanding how the compiler works.
GOALS, PRINCIPLES AND PROBLEMS
The primary goal is to make available all of the efficiency of assembly
language in a reasonable manner.
1. Register allocation must be done in an obvious and efficient way.
2. As much as possible, consistent with other principles, we want to allow
the programmer to experiment with register allocation without rewriting the
form of expressions.
3. The details of the 68000 architecture should be separate from the actual
code. I see no necessity of embedding all the details of which addressing
modes are allowed in which instruction all through the compiler code. This
suggests avoiding flaky rules on expression composition.
4. The code generator ought to function independently of the the precise
details of what kinds of expressions are allowed, but that may be wishful
thinking. The code generation process can be broken down in concept into
three phases:
a) Assign result location to each operator. Once this is done, all other
decisions are completely determined. For binary operators, one operand
will, in general, be a result location from another operator.
b) Determine whether a machine instruction exists to apply the indicated
operator to the two operands and produce the required result. Again, the
previous phase will have tried to ensure that the result location is the same
as one of the arguments.
c) A peephole optimizer pass operates to handle flow-of-control problems.
At present, I do not see a peephole pass being applied to arithmetic
expressions, though I have not really looked into this possibility.
Consider the following expression.
a = b + (c = d + (e = 2 + f)));
The CSTAR translator treats this as exactly equivalent to
e = 2 + f;
c = d + e;
a = b + c;
with the restriction that c may not be a and e may not be c. But C compilers
are free to rearrange the order of evaluation of operands, so additional
restrictions are needed to ensure that the corresponding C expression always
is evaluated correctly. In the above example, d should not be e and b should
not be c. Isn't the assignment a sequence point?
ON PREDICTABILITY
We are entitled to predict what we have specified, and no more (or not very
much more.) It seems to me, right now, that a statement like:
m1 = (m2*m3) + (m4*m5);
really is, even in CSTAR a specification which says:
1. put the LHS into the RHS and don't tell me how you did it.
2. don't touch anything except the authorized temporaries.
I'm really not entitled to know more, because I didn't say more.
Now, in practice, I ought to help myself to a little bit more, and say that as
long as every non-assignment operator corresponds to an assignment
operator, one for one, e. g.:
m1 = (d2 = m2 * m3) + (d3 = m4 + m5);
then there really aren't any choices left (except for whatever moves are
necessary to reconcile operands and results), so now--and only now--I
haven't asked for any surprises, and so I'm entitled to not get them.
EKR: doesn't this still leave open the question of whether to use the LHS
as a temporary or not?
What's new and useful is that I have that choice. I didn't have that choice in
full C, because I can only specify these things loosely in full C, and the
compiler is entitled to ignore me. I didn't have that choice in macro
assembler, but for a different reason: I was obliged to specify everything all
the time, whether I needed to or not, which is why programming was so
impossibly tedious.
So I don't want to give up the right to specify register designators, and to
know how function calls are done, and go back to C, because I want the
option to handle those things in a specific, assembly-like way. Nor do I to
eschew all compiler decisions all the time and go back to macro assembler,
because I want relief from the burden of specifying everything all the time,
needed or not.
In a way, I want in-line assembly code, but in-line assembly code that
works and is workable. Normally it is not workable, because the register
designators can't be treated as variables, so you have to go through a big
song and dance with lots of overhead in order to feed something to the in-
line code, and then another big song and dance to get it out again; nobody
could ever optimize anything that way.
It might be helpful to have the compiler issue a message if a "ternary"
assignment, etc., compiles to a surprise. The programmer would be
informed about an incorrect assumption (possibly resulting from an
overlooked instruction-set quirk), and we'd have to decide what a
"surprise" is, or whether there's any such thing.
HOW CAN WE EXPERIMENT WITH REGISTER/MEMORY ALLOCATION?
We can't, unless the rules for composing expressions are blind to register
vs. memory allocations. They need not be blind with respect to pointers vs.
nonpointers, since that is a question of type.
For assignments, there is always at least one data temporary register and
one address temporary register, as specified in the command line or in the
function. We will refer to the first pair as as and ds for convenience. The
programmer should act as though all of these scratch registers are clobbered
by any assignment. Any assumption to the contrary may not be portable,
which may be a good reason to allow for specifying a temporary-count
individually for each function. (E.g. it might be that it is infeasible to put a
result into local memory without loading a component of the address into
the address temporary.)
On the 68000, and other processors that distinguish address and data
registers, the type of a hardware address register shall always be pointer.
On the 8086/80186/80286, this is guaranteed to get weird.
The type of the result of an operator can be calculated in a machine
independent way as the tree for the expression is built bottom up by the
reduce() routine. The rules for doing this depend only on the rules stated in
K & R.
Functions which produce a non-pointer result leave that result in ds.
Functions which produce a pointer result leave that result in as.
Temporaries will be used in a canonical way in coding any expression
which invokes them. The compiler will do no rearrangements, and will
evaluate operands by order of association unless there are contradicting
parentheses. In order to enhance speed and simplify code generation, a data
register will be used in lieu of the LHS to accumulate the result, if the LHS
is in memory and the RHS is nontrivial. Complex expressions which are
speed- or space-critical should (as a matter of style) either be decomposed,
or have intermediate assignments explicitly specified on the RHS.
In CSTAR, stack temporaries are not used, so there is no indefinitely
extensible set of temporaries available. It is therefore advisable to keep
expressions as simple as possible consistent with readability, in order to
know that there are enough temporaries.
In the simple, ternary assignment statement, the LHS will not be used as an
accumulator [if there is any alternative], [or maybe never at all, but what
about exclusive-or?] because of the following:
m = d1 + d2;
option1: treat as:
m = d1;
m += d2; one move, one add, two addresses, 32 clocks
option 2: treat as:
m = dt = d1 + d2;
dt = d1;
dt += d2;
m = dt; two moves, one add, one address, 20 clocks
Most assignments will have implied moves under the covers since the
processors generally require results to be placed in at least one, and possibly
in both, operands, while the operator form actually written in source code
implies no such thing. If speed or compactness is highly critical, the
programmer must use forms that are more specific, and correspond more
closely to the hardware. However, such forms tend to be less readable, so
there is no requirement to use them in non-critical code.
EKR: One solution [?] Have assignment operators (+=, *=, etc.) always
produce exactly the indicated code.
STAR EXPRESSIONS
Star expressions arise out of the P> and [ ] operators, but with luck that
should not affect this discussion.
Star expressions have 3 parts, 1) an address register, 2) a data register and
3) a constant. At least one of these parts must be present. The parser will
probably rearrange the parse tree so that the unary * operator has these 3
subtrees under it.
Before the parser rearranges the tree, the operand of a unary * operator is
either:
1) another unary * operator or
2) a tree containing a star expression whose operands are joined by +
operators (or a P operator if the P operator applies to the constant.)
The a0 register is used to do multiple indirections as required. For example,
the star expression ***a5 generates
a0 = *a5;
a0 = *a5;
and *a0 is then used as the equivalent operand.
Scaling is done in d0. Scaling is done if the data register is present and the
scale factor is not 1. For example, the star expression
*(a5 + d5)
which is equivalent to
a5 [d5]
is done as
d0 = d5; /* assembly language (no scaling) */
d0 *= scale_factor;
a0 = a5 + d0
and then *a0 is used as the equivalent operand. Note that scaling by
constants between 2 and 10 (?) is done by masking and shifting instead of
multiplying.
Similarly, the star expression
* (d5 + base_constant)
which is equivalent to
base_constant [d5]
is done as
d0 = d5
d0 *= scale_factor;
a0 = base_constant + d0;
and then *a0 is used as the equivalent operand.
When scaling is not required at run time, i.e., when the data register is not
present or when the scale factor is one, star expressions are equivalent to
68000 addressing modes. For example,
char * a1;
*(a1 + d1 + 5);
the star expression is the same as the equivalent operand and generates the
#5(a1,d1)
addressing mode.
One could, in theory, allow more complex star expressions by substituting
an inner expression for the data register. For example:
* (a1 + (d1 = *a2) + 5);
This would be unwrapped as:
d1 = *a2;
*(a1 + d1 + 5);
At present, I see no particular problem with allowing this generality.
Indeed, one could use d0 as an implicit d register and write
*(a1 + *a2 + 5);
which would be the same as
d0 = *a2;
*(a1 + d0 + 5);
It turns out that finding the three separate parts of a star expression is not
difficult. At the top level of a tree for the star expression, look for a subtree
with pointer type. That subtree contains the address register. If the subtree
contains a binary operator, one subtree of the binary operator will contain
the pointer and the other will contain the data register and/or the constant.
Again, if that subtree contains a binary operator, one subtree must contain
the data register and the other subtree must contain the constant or constant
expression.
WHEN ARE EXPRESSIONS TOO COMPLEX?
When we run out of temporaries given the expression and the current
settings.
Associated with each assignment is a list of forbidden registers which may
not appear in the RHS of the assignment. This list is created by the
unwrapping program. Actually, it's probably more complicated than this.
There must be some canonical tools.
The result of a subtree need not be either the LHS or the result temporary if
the subtree consists of a primitive. In that case, that subtree does not limit
in any way the form of the complexity of the other subtree. Eh?
REWRITING RULES
An expression of the form
a = arg1 op1 arg2 op2 ... opn argn+1
is, by definition, equivalent to
a = arg1
a op1 = arg2
a opn = argn+1
I don't know anything about this from here down.
As the result of this equivalence, the LHS becomes a hidden common
subexpression. The LHS temporary may be used to store this hidden
common subexpression. For example, if the LHS temporary is available,
the expression
***a5 = a + b;
generates
a1 = **a5;
*a1 = a;
*a1 += b;
If the LHS temporary is not used, the following code would have to be
generated (In general, *a0 might be the effective operand for a or b).
a1 = *a5;
a1 = *a5;
*a1 = a;
a1 = *a5;
a1 = *a5;
*a1 += b;
In actuality, the expression ***a5 = a + b would not be allowed if the LHS
temporary is not used.
Another way around this problem would be to write the expression this way
***a5 = d0 = a + b;
In general, if the effective LHS of an expression is *an, the only
non-primitive operators allowed on the RHS are + P and ^.
Rewriting rules move a lot of "code generation" into the parser.
MANAGEMENT OF LOC_NODES DESIGNATING SIMPLE REGISTERS
The management of loc_nodes is not as simple as we once thought it was.
At this time, in fact, they are not managed at all, but created at will
in great profusion. The essence of the problem is that because of the
way the parser makes loc_nodes, it is permissible to alter them at will--
until they get attached to the code tree. Once they get attached to the
code tree, they can no longer be altered because to do so would cause
retroactive changes earlier in the code. This fact underlaid our bafflement
at the initial behavior of the code generator.
Two techniques are proposed to minimize proliferation, and are already
provided for (and a compile-time switch should switch them out, to help
us track down the inevitable bugs!) The simpler one is to mark nodes when
they get attached to the code list, so that conditional routines like
locn_xdupl need not copy them unless they are marked. In most cases,
this means that extra copies won't be made except when inner-level
assignments are used as values, as in a = b = c;
The more complex one involves the locn_chmod routine, which is meant to
manage a set of simple register and register-indirect nodes. These
permanent nodes will be marked, so that they never get altered, and
routines like locn_chmod will operate by substituting one for another.
This will limit most copying of temporaries to cases where nodes are
actually combined, rather than just referred to.
Provision is also made for locn_xconst to do something similar if it
turns out that we are generating lots of internal ones. Note that we
may want to improve the folding routines in the parser eventually to
eliminate the creation of superfluous zeros, in particular. These
nodes do need to carry their types, at least transitorily, but perhaps
there is some practical way to work this out.
VARIABLES AND VARIABLE NAMES
Variables are set up in the parser to be suitable for their intended
storage class. This fact implies some coupling of the parser to the
code generator; the parser is not totally machine-independent.
There is as yet no mechanism for properly determining the range of
a symbolic name; that is, whether it can be combined into an address
mode or not, although portions are in place. In particular, globals
have to be always regarded as out-of-range, since the linker may place them
anywhere at its own option.